Extracting Characteristic Sentences from Related Documents
نویسندگان
چکیده
More and more information is available recently. To find a chance i.e., an important event for decision-making, we have to be prepared for the chance. Recent progress of automatic summarization may contribute to Chance Discovery in that it helps a user read a lot of documents easily and be prepared for the chance. In this paper, we develop a new method for multi-document summarization which extracts a set of characteristic sentences that maximizes the coverage of an original content and minimizes the redundancy of a summary. On top of the summary result, we provide a word cooccurrence graph and show why the result is obtained.
منابع مشابه
Mining Very-Non-Parallel Corpora: Parallel Sentence and Lexicon Extraction via Bootstrapping and E
We present a method capable of extracting parallel sentences from far more disparate “very-non-parallel corpora” than previous “comparable corpora” methods, by exploiting bootstrapping on top of IBM Model 4 EM. Step 1 of our method, like previous methods, uses similarity measures to find matching documents in a corpus first, and then extracts parallel sentences as well as new word translations ...
متن کاملMining Very-Non-Parallel Corpora: Parallel Sentence and Lexicon Extraction via Bootstrapping and EM
We present a method capable of extracting parallel sentences from far more disparate “very-non-parallel corpora” than previous “comparable corpora” methods, by exploiting bootstrapping on top of IBM Model 4 EM. Step 1 of our method, like previous methods, uses similarity measures to find matching documents in a corpus first, and then extracts parallel sentences as well as new word translations ...
متن کاملExtracting Paraphrases from Definition Sentences on the Web
We propose an automatic method of extracting paraphrases from definition sentences, which are also automatically acquired from the Web. We observe that a huge number of concepts are defined in Web documents, and that the sentences that define the same concept tend to convey mostly the same information using different expressions and thus contain many paraphrases. We show that a large number of ...
متن کاملSingle Document Keyphrase Extraction Using Sentence Clustering and Latent Dirichlet Allocation
This paper describes the design of a system for extracting keyphrases from a single document The principle of the algorithm is to cluster sentences of the documents in order to highlight parts of text that are semantically related. The clusters of sentences, that reflect the themes of the document, are then analyzed to find the main topics of the text. Finally, the most important words, or grou...
متن کاملExtracting Comparative Sentences from Korean Text Documents Using Comparative Lexical Patterns and Machine Learning Techniques
This paper proposes how to automatically identify Korean comparative sentences from text documents. This paper first investigates many comparative sentences referring to previous studies and then defines a set of comparative keywords from them. A sentence which contains one or more elements of the keyword set is called a comparative-sentence candidate. Finally, we use machine learning technique...
متن کامل